home *** CD-ROM | disk | FTP | other *** search
-
-
- CHAPTER NINE
-
- DEVELOPING AN SGML DATABASE FOR CD-ROM
-
-
-
- SGML PROOF-OF-CONCEPT PROTOTYPE
- DEFENSE MAPPING AGENCY
- DIGITAL SAILING DIRECTIONS
-
- Walter Klaus
- Defense Mapping Agency Systems Center
- Fairfax VA
-
- Ronald Hawkins
- Science Applications International Corporation
- McLean VA
-
-
- Abstract: The Defense Mapping Agency (DMA) is
- currently developing a Text Product Standard
- (TPS) to support the production, distribution
- and use of DMA publications in a digital
- environment [1]. Key features of the TPS
- include use of the Standard Generalized Markup
- Language (SGML) for text structuring and
- Compact Disc Read-Only Memory (CD-ROM) as the
- distribution media. This paper provides an
- overview of the effort, including the TPS
- proof-of-concept prototype on CD-ROM.
-
-
- INTRODUCTION
-
- The TPS is one leg of a standardization triad under
- development by DMA for its digital data products. The
- other two legs are the Vector Product Standard (VPS) and
- the Raster Product Standard (RPS). VPS will be the
- standard for DMA's digital vector data products and is in
- the final stages of development. RPS will be the standard
- for raster and gridded data products and is in the
- preliminary stages of development. A short discussion
- follows.
- Besides producing paper maps, charts, and supporting
- publications for the Merchant Marine and the military
- services, DMA already produces some digital data products
- [2]. These products include vector, raster, and gridded
- Mapping, Charting, and Geodesy (MC&G) data supporting
- command, control, communications, and intelligence (C3I)
- systems, precision-guided weapon systems, simulators,
- training, scientific analyses and other applications.
- Examples of vector data products using the standard
- Vector Product Format (VPF) include the new Digital Chart
- of the World (DCW), a 1:1,000,000 scale worldwide
- database of geographic features; World Vector Shoreline
- (WVS), a 1:250,000 scale worldwide database of
- shorelines, international boundaries, and country names
- (now being converted to VPF); and, the Digital Nautical
- Chart (DNC) for electronic navigation and chart display
- currently under development.
- Examples of raster data include scanned color images
- of paper maps and charts such as ARC Digitized Raster
- Graphics (ADRG). Gridded data includes Digital Terrain
- Elevation Data (DTED), a worldwide database of land
- elevations and Digital Bathymetric Data (DBD), a
- worldwide database of ocean depths.
- Of the products mentioned above, ADRG and DTED are
- currently distributed on CD-ROM; DCW will be distributed
- on CD-ROM shortly; a recent prototype CD-ROM for WVS is
- now being converted to VPF. CD-ROM is an ideal
- distribution media because of its low cost, high storage
- density, and adherence to ISO standards which provide
- computing platform independence. To facilitate the
- production and interchange of these data products and the
- interoperability of systems using them, DMA has embarked
- on this program to standardize data structures and
- formats.
- The TPS is the third major standard under
- development by DMA and is at the forefront of a paper-to-
- digital transition of DMA's publication development and
- distribution environment. DMA produces a number of
- publications related to mapping and charting. These
- include the Sailing Directions and Fleet Guides, which
- are voyage planning and navigation publications for
- Merchant Marine and U.S. Navy ships; the Chart Update
- Manual (CHUM), which provides listings, updates, and
- corrections to DMA aeronautical charts; and the
- Gazetteer, which is a publication containing place names
- and related information worldwide. Efforts are underway
- to produce digital versions for each of these textual
- publications; they are further described later in this
- paper. Developing a standard for digital text products
- benefits both the producer and the users. The producer
- benefits because the text and graphics data for each
- publication is based on standard data structures,
- allowing a common set of tools to be used for the entry
- and editing of this data and permitting the transfer of
- data among workstations at various stages in the
- production process. In addition, since DMA exchanges
- production data with other nations, adopting standard
- data structures will facilitate international exchange
- and support interoperability as well.
- The user benefits because development of the TPS
- will allow DMA to distribute text products as self-
- contained publications on CD-ROM. These "intelligent
- publications" will include text and graphics data
- integrated with retrieval software supporting access
- techniques such as browsing, query, and hypertext. In
- addition to providing an alternative to printed
- publications, the intelligent publication concept will
- support the integration of DMA's digital text products
- with other computerized information systems. Standards
- being considered and adopted under the TPS umbrella
- include data structures and data access language
- facilities which will support interoperability of
- intelligent publications and text databases with other
- standards-based digital data products.
- One of the key benefits of the DMA standardization
- program is the eventual integration of vector, raster,
- and text data in common systems. The capability to
- reference and view DMA publication data by pointing and
- clicking a mouse on a vector or raster map display offers
- significant future potential for C3I, navigation,
- training, and other automated systems.
-
-
- EXISTING STANDARDS
-
- DMA's formal hierarchy of standards, including existing
- and planned ISO and ANSI standards, form the foundation
- of the TPS; this functionally layered hierarchy is
- reflected in the "TPS Meta-Model" (Figure 1). The DoD
- CALS/CIM concepts influenced early TPS conception. The
- extensive use of formal and defacto standards supports a
- highly flexible interface architecture (Fig. 2) based on
- the OSE/OSI standard models, provides independence from
- storage (media) devices [3] and operating systems [4],
- and facilitate COTS-based engineering. The TPS will
- provide the capability to represent not only text, but
- also embedded graphics in 'standard' vector and raster
- formats. Standards which form the basis for TPS include:
-
- Standard Generalized Markup Language (SGML) (ISO 8879)
-
- It provides a standard means for defining and
- representing document structure using a set of 'tags'
- placed within the document text to signify structural
- components such as titles, chapters, paragraphs,
- illustrations, etc.; SGML is the core standard for the
- TPS [5].
- The development of common Document Type Declarations
- (DTDs) (the formal expression of document structure in
- SGML) will clearly define publication structure,
- providing for controlled change as well as validation and
- configuration management. Maintenance of change authority
- and history logs ('audit trails') will also be supported,
- an important consideration for many DMA publications.
- A key feature of SGML is its capability to reference
- external entities from within the tagged text. This
- capability will support the eventual integration of DMA's
- textual products and vector/raster map data products. For
- example, an SGML-structured digital Sailing Directions
- could reference a DNC 'database'. The Sailing Directions
- user could, by pointing and clicking with a 'mouse', call
- up a section of DNC for his area of interest; he would
- then be viewing the same chart he uses to navigate,
- instead of a 'picture' of a chart, which is currently the
- case. This type of capability would lead to greater
- consistency between products, reduced chance for error,
- and eventual development of integrated voyage planning
- and navigation systems.
- SGML is also expected to provide a capability to
- interface publication databases used for in-house
- production to other DMA systems. An example is the
- marriage of digital Sailing Directions to DMA's
- Navigation Information Network (NAVINFONET), a special
- online service to mariners providing remote access to
- extensive maritime safety information. It can be queried
- from anywhere in the world via modern communications
- equipment on a 24-hour basis. It supports and supplements
- a number of navigation publications, including the
- Sailing Directions. SGML provides the capability to
- reference processing instructions which download Sailing
- Direction updates from the master database to the
- NAVINFONET computers; such an arrangement would provide
- faster and more accurate updates.
- An area in which SGML promises to be of great
- benefit is that of defining standard text data structures
- which can be used with intelligent retrieval systems.
- Systems currently available convert input textual data
- into a structure specific to the vendor's proprietary
- retrieval software; the data is then indexed for direct
- access and query capability. Some of these systems accept
- SGML as an input structure, but converted data is not
- SGML-structured. Retrieval systems could be developed
- which index SGML-structured data directly, eliminating
- the need to convert to vendor-specific structures. This
- would promote interoperability because data supplied on
- CD-ROM could be re-indexed for a different (or even
- multiple) retrieval engine(s), if desired by the end-
- user. The retrieval engine vendor's data preparation
- software would be used to add the necessary index files,
- but no data conversion would be required. Ultimately,
- standard index structures for SGML-tagged data should be
- developed to provide a completely open approach to text
- retrieval [6].
-
- Volume/File Structure of CD-ROM for Information
- Interchange (ISO 9660)
-
- It defines the physical organization of stored files and
- related volume directories on CD-ROM. It also supports
- the development of operating system-independent
- interfaces to data stored on CD-ROM media. CD-ROM was
- selected as a primary distribution media because of its
- low cost, high storage density, platform independence,
- and its potential for integrating other DMA digital
- products [7].
-
- Tag Image File Format (TIFF)
-
- It is an industry (defacto) standard for raster images
- using a flexible tag and directory scheme which is
- extensible without sacrificing compatibility. TIFF will
- be used for raster drawings and illustrations [8].
-
- CCITT-Group/4 compression (FIPS PUB 150)
-
- It provides a standard algorithm for the compression of
- black-and-white raster images. It will be used in
- conjunction with TIFF. providing standard compressed file
- structures for black-and-white images [9].
-
- Computer Graphics Metafile (CGM) (ISO 8632)
-
- It is a formal standard for the representation of vector
- graphics. It will be used for drawings and illustrations
- which are created or maintained as vector data [10].
-
- Data Access Languages
-
- These standards (under development), attempt to define
- protocols for accessing data on CDROM from retrieval
- systems. Adoption of standard data access languages and
- client-server architectures will separate user
- applications ("front-ends") from text databases on CD-
- ROM, allowing different standard protocol-compliant
- applications to access the same database. This supports
- interoperability and reduces the requirement to have a
- unique user interface and means of accessing text data
- for every CD-ROM database which is produced. The
- situation is analogous to the use of Structured Query
- Language for accessing traditional databases. Emerging
- standards which are being considered include (among
- others) the Compact Disc Read Only Data Exchange Standard
- (CD-RDx) under development by the Intelligence Community
- [11] and the Structured Full Text Query Language (SFQL)
- under development by the Air Transport Association [12].
-
-
- PROTOTYPE DEVELOPMENT
-
- DMA is approaching the development of TPS with a
- prototyping effort designed to validate: (1) the
- standards and technologies and (2) the proposed
- client/server architecture (Fig. 3) selected for the TPS.
- It will also provide DMA publication users with a working
- version of a TPS-based electronic publication for
- evaluation and comment. The initial proof-of-concept
- prototype, including an intelligent publication on CD-ROM
- and a publication database consisting of SGML-tagged text
- and graphics in 'standard' formats, is planned for
- completion in May, 1992.
- The current, non-digital 47-volume series of Sailing
- Directions uses an oceanic basin concept and provides,
- for each basin, a planning guide with the oceanographic,
- meteorological, route, and other information required for
- an ocean passage. Three to seven enroute publications
- accompany each oceanic planning guide to provide textual
- and graphic information, including coastal views and
- photographs required for inshore navigation and port
- ingress.
- The current non-digital, two-volume Fleet Guide, a sister
- publication of the Sailing Directions, contains
- information designed to acquaint incoming naval ships
- with pertinent command, navigational, operational,
- repair, and logistical information on frequently visited
- ports in both the United States and foreign countries;
- there is one volume for the Atlantic Fleet and one for
- the Pacific Fleet. Much of the information contained in
- the Fleet Guide is similar to that found in applicable
- volumes of the Sailing Directions, but the Fleet Guide
- emphasizes areas of special interest to U.S. Navy ships
- such as command relationships, operational
- responsibilities, and munitions support capabilities.
- The Digital Sailing Directions proof-of-concept
- prototype data-set consists of the following publications
- on CD-ROM: the Planning Guide for the North Pacific (PUB
- 152), the Enroute Guides for Japan, Volumes I and II
- (PUBs 158 and 159) and Chapter 11 of the Pacific Fleet
- Guide (PUB 941).
- The Digital Sailing Directions prototype
- incorporates several key standards previously described,
- including SGML (ISO-8879), CD-ROM (ISO-9660/10149), TIFF,
- CCITT-Group/4 and CGM (ISO-8632). Technology areas which
- are still emerging and are not demonstrated in this
- prototype are direct indexing of SGML-compliant data for
- full-text retrieval and use of a standard data access
- language. Developments in these areas are being followed
- with the goal of incorporating them into future Digital
- Sailing Directions and other digital text data products
- [13].
- Development of the Digital Sailing Directions
- prototype began with the delivery of text in a tagged
- format used by the Government Printing Office. Graphics
- were provided in vendor-specific raster format. Analysis
- of the provided data revealed that the text data could be
- converted to the SGML structure relatively easily;
- however, graphics data presented a problem for two
- reasons: (1) the data structure was proprietary; and (2)
- the quality of many of the digital graphics was poor. In
- the end, a translator was found which could convert some
- of the graphics to a TIFF structure, but many had to be
- scanned from the original illustrations and photographs.
- Document Type Definitions (DTD) for each of the
- publications (Planning, Enroute & Fleet Guides) were then
- developed and completed as formal SGML DTDs. Sailing
- Directions experts from DMA worked closely with the SGML
- consultant during this phase of the project. In addition
- to their experience in producing Sailing Directions
- publications, most of these experts possess Merchant
- Marine or U.S. Navy background, with 'hands on'
- experience in using the Sailing Directions for voyage
- planning. The participation of these experts was a key
- factor in properly defining structures embodied in the
- DTDs.
- Following development of the DTDs, processing
- instructions were written to convert and add SGML tags to
- the input text data. Manual proofing and editing was then
- conducted to finalize conversion of the text data.
- Graphics were converted from the vendor-specific
- format or scanned as necessary. Most graphics were
- converted to TIFF with CCITT-Group/4 compression, but at
- least one CGM vector graphic was included to demonstrate
- the technology.
- The resulting database of text and graphics was then
- integrated with a proprietary SGML-based retrieval
- system. Processing instructions were developed to index
- the SGML-tagged data for use with this retrieval system
- and a customized user interface was designed for Sailing
- Directions display.
- The Digital Sailing Directions prototype operates on
- an IBM-compatible personal computer running the MS/DOS
- operating system and the Microsoft Windows 3.0 graphical
- user interface; both VGA and Super VGA displays are
- supported. To allow smooth operation of the Windows
- interface and relatively quick retrieval and display of
- graphics, a minimum of a 80386/25 MHz microprocessor and
- 4 MB of system RAM is recommended. However, the prototype
- has been successfully demonstrated with lesser
- configurations.
- The Digital Sailing Directions prototype CD-ROM
- includes two distinct components. The first consists of
- the SGML compliant dataset described above; it is
- included to demonstrate the capability for electronic
- interchange of publications using SGML. The data can be
- accessed using a SGML publishing system. The second
- component is a complete, intelligent publication
- consisting of the same prototype data-set; however, it
- provides the additional capability to browse through the
- publication and submit queries (e.g., word searches).
- Hypertext capabilities are also provided; graphics may be
- viewed by using a mouse to point and click on highlighted
- references in the text; additionally, the user can move
- between sections of the publications by pointing and
- clicking on highlighted cross-references.
- The initial Digital Sailing Directions prototype is
- planned to be released to the military services and other
- designated evaluators in June, 1992. Concept planning is
- underway for a product prototype which may include the
- entire Sailing Directions (or a large subset) and
- incorporate evaluation comments from the concept
- prototype.
- In other related efforts, DMA recently completed
- development of a Digital Gazetteer (DG) prototype on CD-
- ROM. A second prototype currently being planned will
- demonstrate the integration of vector map graphics (based
- on the VPS) with the Digital Gazetteer text database.
- Development of an Electronic Chart Update Manual (ECHUM)
- prototype on CD-ROM is also being planned. Both the DG
- and ECHUM products are potential candidates for TPS-
- compliant text products and functional integration with
- related digital products.
-
-
- FUTURE DIRECTIONS
-
- Although current information technology standards permit
- development of a proof-of-concept prototype for TPS,
- significant work must still be accomplished in several
- areas; technology areas which require advancement to
- eliminate 'dependencies' are: (1) direct indexing of
- SGML-tagged data for use with retrieval systems; (2)
- development of standard (abstract) index specifications
- for text retrieval; (3) acceptance and implementation of
- standard data access languages for text retrieval
- applications (Fig. 4). However, the core standards for
- the structuring and exchange of text and graphics,
- including SGML, TIFF, CGM, and CD-ROM make the production
- and distribution of intelligent publications on CD-ROM a
- viable proposition in the near future.
-
-
- REFERENCES
-
- [1] Klaus, W., Memorandum - Proposed Text Product
- Standard (TPS) for CD-ROM, Defense Mapping Agency, 8 May
- 1991.
- [2] Defense Mapping Agency (DMA) - Digitizing the Future
- (3rd Ed.), 1991.
- [3] International Standard (ISO 10149), Information
- Processing - Data Interchange on Read-Only 120mm Optical
- Data Discs (CD-ROM), 1989.
- [4] International Standard (ISO 9945), Information
- Processing - Portable Operating System Interface (POSIx),
- 1990.
- [5] International Standard (ISO 8879), Information
- Processing - Standard Generalized Markup Language (SGML),
- 1986.
- [6] United States Air Force (USAF), Computer Resource
- Management Technology Program - (Draft) CD-ROM Index
- Architecture Specification (CIAS), 1990.
- [7] International Standard (ISO 9660), Information
- Processing - Volume and File Structure of CD-ROM for
- Information Interchange, 1988.
- [8] The Microsoft and Aldus Corporations - Tag Image File
- Format (TIFF) Specification (V 5.0), 1988.
- [9] Facsimile Coding Schemes and Coding Control Functions
- for CCITT-Group/4 Facsimile Apparatus, 1988.
- [10] International Standard (ISO 8632-1), Information
- Processing - Computer Graphics Metafile (CGM) for the
- Storage and Transfer of Picture Description Information,
- 1987.
- [11] Director Central Intelligence (DCI/IHC/ICS) -
- (Proposed) CD-ROM Read-only Data EXchange (CD-RDx)
- Standard (V 3.11), August 1991.
- [12] Air Transport Association (ATA) Specification-100,
- Manufacturers Technical Data, Digital Data Standards,
- October 1990.
- [13] Defense Mapping Agency (DMA) - Sailing Directions
- Product Specifications (1st Ed.), 1977 & (Draft) Product
- Specifications, 1990.
-
-
- Related Graphics to this paper:
-
- %g KLA01.pcx;
- %g KLA02.pcx;
- %g KLA03.pcx;
- %g KLA04.pcx;
- %g KLA05.pcx;
- %g KLA06.pcx;
- %g KLA07.pcx;
- %g KLA08.pcx;
- %g KLA09.pcx;
- %g KLA10.pcx;
- %g KLA11.pcx;
- %g KLA12.pcx;
-
-
-
-